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A preference study was made to assess the relative annoyance values 
of slope-overload distortion and granular noise in delta-modulated speech. 
A recently described adaptive delta modulator was simulated at frequencies 
of SO and 40 kHz, and controlled amounts of the two types of degradation 
were introduced into samples of a 2-second utterance. Rankings were 
obtained for these samples on the basis of preference judgments of nine 
listeners, each of whom assessed the samples, pairwise, in a tournament- 
type strategy. Results indicate that the speech sample exhibiting the mini- 
mum degradation on an objective, overall-noise-power basis is not subjectively 
the most preferred sample. Furthermore, the subjectively optimum delta 
modulator exhibits greater overload and lesser granularity than the ob- 
jectively optimum device. 

I. INTRODUCTION 

The principle of delta modulation 1 has been widely described in the 
literature. Briefly, delta modulation is a digital encoding strategy 
which uses a simple feedback mechanism to produce a "staircase" 
approximation to an input signal. A block diagram of the simplest form 
of delta modulation appears in Fig. 1. The input sequence {X r \ is 
usually band-limited and suitably oversampled. The "staircase" se- 
quence Y r is generated according to the equations 

C, = sgn (X r - K r _,) (1) 

Y r - F r _, = m r = A r C r . (2) 

The step-size A r is assumed to be a constant in conventional (linear) 
delta modulation. "Adaptive" delta modulation, on the other hand, 
allows for modifications of A r in accordance with the changing slope 
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T r = X r -Y r _, 



^gn T r =C r 
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Fig. 1 — Schematic diagram of a linear delta modulator. 

characteristics of the input signal. Such adaptation results in better 
encoding, and several types of adaptive delta modulation have been 
described in the literature. 2 ' 3-4 

Figure 2 illustrates the mechanism of an adaptive delta modulator 
and demonstrates how suitable increases and decreases of step size 
facilitate better encoding during steep and flat regions of the input 
signal waveform. Such adaptations can be effected by observations on a 
"recent" segment of the binary sequence {C r }; this is illustrated by 
equation (5) in the sequel. 

Figure 2 also brings out the distinction between two types of encoding 
error in delta modulation, viz., "granular noise" and "slope-overload" 
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Fig. 2 — Illustration of adaptive delta modulation. 
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distortion. A given error sample 

E T = X T - Y T (3) 

can be denned to fall into the granular or slope-overload category, 
depending on whether the corresponding step ra r crosses the input 
waveform or not. Thus, in Fig. 2, there is a 'granular' error E ia at the 
sampling instant t, and an 'overload' error E lt - 1)0 at the sampling 
instant (i - 1). As a matter of definition, we will note that E t0 = 

Eit-DO = 0. 

The signal output \Z r \ of the delta modulator is actually obtained by 
filtering the staircase sequence { Y T j to the input signal band. Let 
\X F T ) be the result of passing \X T \ through the same lowpass filter. A 
perceptually relevant measure of signal degradation is accordingly 
defined by the encoding error 

e r = X" r - Z r . (4) 

As with the quantity E r in (3), one can distinguish samples of granularity 
and slope overload, e r0 and e r0 , in the error sequence \e T \. Referring 
to Fig. 2 once more it can be seen that a physical distinction between 
the two types of error is suggested. Granularity can be described as a 
"signal-uncorrelated" random noise-type of phenomenon. It is char- 
acterized by alternation of signs and tends to be independent of signal 
amplitude. Slope overload, on the other hand, can be described as a 
"signal-correlated" distortion, since its sign and magnitude are related 
to the slope of the signal. This physical difference between slope overload 
and granularity suggests a corresponding perceptual distinction and 
raises the question of the relative annoyance values of the two forms 
of signal degradation in delta modulation. The present paper describes 
a study of the above question as referred to the delta modulation of a 
speech signal. 

Earlier work in this subject is in the form of a perceptual experiment 5 
in which H. Levitt, et al., characterized the perceptibility of slope- 
overload distortion as such. As mentioned earlier, our paper will seek 
to answer the complementary question of the relative perceptibilities 
of slope overload and granularity when they occur simultaneously in 
delta-modulated speech, as they usually do. 

The approach we used was to vary the relative amount of slope 
overload and granularity introduced into samples of a test utterance, 
and to evaluate these samples on the basis of both objective and per- 
ceptual criteria; and then to interpret these evaluations with specific 
reference to the overload-granularity dichotomy. 
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Section II summarizes the salient features of a computer-simulated 
adaptive delta modulator that was utilized in the present study. This 
adaptive encoder has been recently described and shown to provide 
toll-quality speech reproduction at bit rates of practical importance. 4 

Section III defines the objective measures of speech quality used in 
our study, while Section IV defines a subjective measure of preference 
and describes an underlying perceptual experiment. 



II. DESCRIPTION OF THE DELTA MODULATOR 

Figure 3 is a schematic block diagram of the adaptive delta modulator 
utilized in the present study. This encoder is defined by the basic 
equations (1) and (2), and by the adaptation rule 



A r = PA r _, if C T = Cr-, 



= pA r _, 



if C T * C r _, 



P > 1. 



(-5) 



Notice that a conventional (linear) delta modulator corresponds to 
the special case of P = 1. In our study the value of P was a variable 
parameter; different (delta-modulated) speech samples corresponded to 
different suitably spaced values of P, and thereby to different mixtures 
of slope-overload and granularity. 

The original speech sample X was a 2-second male utterance of "Have 
you seen Bill?" that had been band-limited to 3.3 kHz. The delta 
modulation was performed at sampling rates of 20 and 40 kHz. The 
latter frequency provides speech reproduction that approaches telephone 
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Fig. 3 — Schematic diagram of an adaptive delta modulator. 
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quality. 4 The lower sampling rate was included to provide a better 
demonstration of the annoyance properties of delta-modulated speech. 

III. OBJECTIVE MEASURES OF SPEECH QUALITY 

We recall the encoding error e t , (4), and define the following measures 
of delta-modulator performance. (Summations are over the entire length 
of the speech utterance, and a nonzero granularity error at r = t implies 
zero overload error, and vice versa.) 

(i) The overload-noise energy in Z: 

No = E to (6) 

(«) The granular-noise energy in Z: 

N G = Z <o (7) 

(Hi) The signal-to-noise ratio: 

SNR = £ X ^ 2 (8) 

2^ e rO + 2^ e rG 

(iv) The signal-to-granular (overload) -noise ratio: 

SNR 0(0) = ^T *' (9) 

IV. A SUBJECTIVE MEASURE OF SPEECH QUALITY 

The perceptual evaluations of this paper are based on the pooled* 
judgments of nine listeners each of whom assessed speech stimuli 
in six runs of a perceptual experiment. Each of these 54 experiments 
was a double-elimination tournament 1 (with a different, random, 
starting line-up) . Matches in each tournament were between contending 
stimuli, playing two at a time. The result of each match was in the 
form of a binary preference judgment by the listener, while the result 
of a tournament was a set of scores awarded to each of the contesting 
speech stimuli on the basis of its record in the tournament. The actual 
scoring rule 1 was one which, together with the double-elimination 

* Intralistener variations were found to be less significant than the intrastimulus 
differences. 

t The number of contending speech stimuli was also nine, at each sampling rate. 

I The tournament ended when every losing contestant had lost twice. 

5 In the course of each tournament a contestant accumulated a score as follows. 
No score was earned for a match that was lost; while, after every match that was 
won, the contestant's score was the sum of the accumulated scores, before the match, 
of the contestant and of the loser, plus one. 
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strategy, provided a useful alternative — as concluded from a separate 
simulation — to the more comprehensive testing procedure where every 
contending stimulus would be pitted against every other. 

It was recognized, however, that both the scoring rule and the double- 
elimination strategy were empirical procedures. This was more so 
because they were applied to what was apparently a probabilistic 
environment: the binary preference-response of a listener to a given pair 
of contending stimuli can well be random, especially when the stimuli 
are not obviously different. It was, therefore, decided not to emphasize 
the actual scores obtained in the perceptual test. They were only used, 
instead, to extract a crude ranking information that would be less 
sensitive to the testing and scoring procedures. 

Consequently, the following subjective preference value Q was 
assigned to each of M contesting speech stimuli: 

£ = §rrf; r = i,2,--,m (io) 

where R is the rank assigned to a stimulus on the basis of its accumulated 
score in the 54 runs of the perceptual test. 

V. SUMMARY OF RESULTS 

Figure 4 displays normalized values of the objective measures of 
quality SNR, SNR G , and SNR , as well as the subjective preference 
function Q, as functions of the adaptation parameter P. The following 
observations emerge: 

(z) The speech sample representing the minimum overall-noise-energy 
is not subjectively the most preferred sample. In fact, at both 20 and 
40 Hz, the objective and subjective optima can be characterized by 

Po S p? J = 1-2 (11) 

Popt = 1.5. (12) 

(it) The approximate coincidence of the SNR and SNR curves 
indicates, by virtue of equations (6) through (9), that 

N » N G (13) 

for all considered values of P. 

(Hi) The relative disposition of the SNR , SNR C , and Q curves — 
and of their maxima — demonstrates that, in spite of the preponderance 
(13) of overload in the overall-noise-energy, the granularity in a speech 
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Fig. 4 — Evaluations of delta-modulator performance. 



stimulus has a strong influence on its subjective preference value. 

Note that there is a double peak in the subjective preference curve, 
Q, for the 40-kHz case. This curve unambiguously ranks each of the 
experimental stimuli according to equation (10). However, the actual 
scores underlying this ranking show only a small difference between 
the stimulus with the secondary peak and the one immediately preceding 
it. What is probably indicated is a general broadening of the peak of 
the preference function for values of P between 1.2 and 1.5. 

Table I lists, for the optimal characterizations (11) and (12), values 
of N„ i(1) (given as fractions of signal energy), SNR, and Q. Notice that 
the subjectively optimum delta modulator displays lesser granularity 
(N a ) and greater overload (N „) than the objectively optimum modulator. 
It is again obvious that in perception, overload and granularity are 
not weighed in proportion to the respective noise energies N and N G ; 
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Table i — Characteristic of Optimal Adaptive Delta Modulation 
(No and N are entered as fractions of signal energy) 



Sampling 
Frequency 


P 


No 


No 


SNR 


Q 


20 kHz 


SUBJ 
P = 1.2 
OPT 

OBJ 
P = 1.5 
OPT 


0.0216 
0.0158 


0.0003 
0.0004 


43 

58 


1 
0.81 


40 kHz 


SUBJ 
P = 1.2 

OPT 

OBJ 
P = 1.5 
OPT 


0.0022 
0.0016 


0.00003 
0.00004 


450 
640 


1 
0.91 



in fact, the perceptual preference of a speech sample seems to be deter- 
mined very strongly by the extent of granularity in it, although the 
latter represents a very small fraction of the total noise energy. 

Finally, Table I indicates that distinctions between objective and 
subjective assessments of speech quality appear to be less significant 
at the higher sampling rate of 40 kHz; thus, for example, the objectively 
best delta modulator has a greater value of subjective perference Q 
at 40 kHz than at 20 kHz. 



VI. CONCLUSION' 

We have shown that in delta modulation, a speech sample exhibiting 
the minimum degradation on an objective, overall-noise-energy basis 
is not equivalent, in general, to the perceptually most preferred sample. 
We have also indicated that this distinction may be less significant in 
higher quality delta modulation than in a low-bit-rate encoder. 

The subjectively optimum delta-encoder displays a greater overload 
No and lesser granularity N a than the objectively best encoder. This 
feature, together with the fact that N » N G in either case, suggests 
the strong influence of granular noise on the perceptual assessment 
of a speech sample; equivalently, a lesser "annoyance value" is to be 
associated with slope-overload distortion.* A possible explanation of 

* Companding in PCM exploits a similar but not identical subjective phenomenon, 
viz., the greater tolerance to encoding errors in regions of high input amplitude. 
(Notice however, that in delta modulation, slope overload is not confined to high- 
amplitude regions, nor is granularity associated only with low input amplitude.) 
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this observation would be the fact that granularity is explicitly per- 
ceivable by a listener as an "additive background noise," while slope- 
overload distortion exists only in relation to an original signal which is 
not known to the listener. 

Finally, our observation that slope overload is "less annoying" than 
granularity is to be invoked with caution. Broadly speaking, we believe 
that our conclusion would apply very well to speech that achieves or 
approaches telephone quality. In extremely low-quality delta modula- 
tion (such as may be used in special applications), on the other hand, 
the intelligibility of speech will be a critical criterion; and in such a 
situation, depending on other factors like ambient noise at a transmitter, 
slope overload may very well become a more important perceptual 
attribute. 
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